Load and bind datasets into single one. Adding manufacturer column.
df <- map2(# map through file and manufacturer names and read dataframes
c("audi", "bmw", "merc", "vw"), # filename
c("Audi", "BMW", "Mercedes", "Volkswagen"), # manufacturer
function(filename, manufacturer) {
read_csv(glue("./data/{filename}.csv"),
col_types = "fiififidd"
) %>%
mutate(manufacturer = as_factor(manufacturer)) # add column
}
) %>%
reduce(~ bind_rows(.x, .y)) # Bind rows into single dataframe
Get a sample of 5000:
set.seed(19990428)
df <- df %>%
slice_sample(n = 5000)
Add manufacturer to model, convert year and engineSize to factors and add auxiliary factor variables for the numeric ones:
df <- df %>% mutate(
model = as_factor(paste0(manufacturer, " - ", model)),
age = 2020 - year,
aux_price = cut_number(price / 1000, 4),
aux_mileage = cut_number(mileage / 1000, 4),
aux_mpg = cut_number(mpg, 4),
aux_tax = cut_number(tax, 2),
aux_age = cut_number(age, 4),
year = as_factor(year),
engineSize = as_factor(engineSize)
)
#> Registered S3 method overwritten by 'papeR':
#> method from
#> Anova.lme car
#> Warning in kable_styling(., latex_options = c("HOLD_position"), full_width =
#> FALSE): Please specify format in kable. kableExtra can customize either HTML or
#> LaTeX outputs. See https://haozhu233.github.io/kableExtra/ for details.
| N | Mean | SD | Min | Q1 | Median | Q3 | Max | |||
|---|---|---|---|---|---|---|---|---|---|---|
| price | 5000 | 21571.45 | 11544.02 | 1295.0 | 13990.0 | 19498.0 | 26030.0 | 149948.0 | ||
| mileage | 5000 | 23054.10 | 22309.69 | 1.0 | 5904.0 | 16500.0 | 33297.0 | 168000.0 | ||
| tax | 5000 | 123.60 | 62.56 | 0.0 | 125.0 | 145.0 | 145.0 | 570.0 | ||
| mpg | 5000 | 54.19 | 18.11 | 1.1 | 45.6 | 53.3 | 61.4 | 470.8 | ||
| age | 5000 | 2.78 | 2.10 | 0.0 | 1.0 | 3.0 | 4.0 | 19.0 |
#> Warning in kable_styling(., latex_options = c("HOLD_position"), full_width =
#> FALSE): Please specify format in kable. kableExtra can customize either HTML or
#> LaTeX outputs. See https://haozhu233.github.io/kableExtra/ for details.
#> Warning in footnote(., general = "year, model and engineSize omitted"): Please
#> specify format in kable. kableExtra can customize either HTML or LaTeX outputs.
#> See https://haozhu233.github.io/kableExtra/ for details.
| Level | N | % | ||
|---|---|---|---|---|
| transmission | Manual | 1784 | 35.7 | |
| Automatic | 1332 | 26.6 | ||
| Semi-Auto | 1884 | 37.7 | ||
| Other | 0 | 0.0 | ||
| fuelType | Petrol | 2065 | 41.3 | |
| Diesel | 2860 | 57.2 | ||
| Hybrid | 65 | 1.3 | ||
| Other | 10 | 0.2 | ||
| Electric | 0 | 0.0 | ||
| manufacturer | Audi | 1072 | 21.4 | |
| BMW | 1106 | 22.1 | ||
| Mercedes | 1340 | 26.8 | ||
| Volkswagen | 1482 | 29.6 |
#> Warning in kable_styling(., latex_options = c("HOLD_position"), full_width =
#> FALSE): Please specify format in kable. kableExtra can customize either HTML or
#> LaTeX outputs. See https://haozhu233.github.io/kableExtra/ for details.
| Level | N | % | ||
|---|---|---|---|---|
| aux_price | [1.29,14] | 1254 | 25.1 | |
| (14,19.5] | 1252 | 25.0 | ||
| (19.5,26] | 1244 | 24.9 | ||
| (26,150] | 1250 | 25.0 | ||
| aux_mileage | [0.001,5.9] | 1251 | 25.0 | |
| (5.9,16.5] | 1252 | 25.0 | ||
| (16.5,33.3] | 1247 | 24.9 | ||
| (33.3,168] | 1250 | 25.0 | ||
| aux_mpg | [1.1,45.6] | 1338 | 26.8 | |
| (45.6,53.3] | 1291 | 25.8 | ||
| (53.3,61.4] | 1188 | 23.8 | ||
| (61.4,471] | 1183 | 23.7 | ||
| aux_tax | [0,145] | 3969 | 79.4 | |
| (145,570] | 1031 | 20.6 | ||
| aux_age | [0,1] | 1888 | 37.8 | |
| (1,3] | 1453 | 29.1 | ||
| (3,4] | 871 | 17.4 | ||
| (4,19] | 788 | 15.8 |
If we count the number of NA values per row, we find that there are no explicit NA in the sample, as shown in :
#> Warning in kable_styling(., latex_options = c("HOLD_position"), full_width =
#> FALSE): Please specify format in kable. kableExtra can customize either HTML or
#> LaTeX outputs. See https://haozhu233.github.io/kableExtra/ for details.
| Variable | Missing | Zeros |
|---|---|---|
| model | 0 | 0 |
| year | 0 | 0 |
| price | 0 | 0 |
| transmission | 0 | 0 |
| mileage | 0 | 0 |
| fuelType | 0 | 0 |
| tax | 0 | 152 |
| mpg | 0 | 0 |
| engineSize | 0 | 13 |
| manufacturer | 0 | 0 |
There are no severe outliers.
shows the QQ plots.
QQ plots
#> Warning in ks.test(., "pnorm", mean = mean(.), sd = sd(.)): ties should not be
#> present for the Kolmogorov-Smirnov test
#> Warning in ks.test(., "pnorm", mean = mean(.), sd = sd(.)): ties should not be
#> present for the Kolmogorov-Smirnov test
We perform a Durbin-Watson test with the null hypothesis that the autocorrelation of the disturbances is 0. We obtain a p-value of 0.95 so we fail to reject the null hypothesis.
The results of the test are consistent with the visual interpretation of the ACF plot1 shown in . All the values except lag = 33 lie within the confidence interval of 95%, showing that there is no autocorrelation.
ACF plot for price
Spearman correlation plot
#> Warning in kable_styling(., latex_options = c("HOLD_position"), full_width =
#> FALSE): Please specify format in kable. kableExtra can customize either HTML or
#> LaTeX outputs. See https://haozhu233.github.io/kableExtra/ for details.
| price | mileage | tax | mpg | age | |
|---|---|---|---|---|---|
| price | 1.00 | -0.64 | 0.39 | -0.56 | -0.69 |
| mileage | -0.64 | 1.00 | -0.25 | 0.43 | 0.85 |
| tax | 0.39 | -0.25 | 1.00 | -0.59 | -0.29 |
| mpg | -0.56 | 0.43 | -0.59 | 1.00 | 0.41 |
| age | -0.69 | 0.85 | -0.29 | 0.41 | 1.00 |
#> Warning in kable_styling(., latex_options = c("HOLD_position"), full_width =
#> FALSE): Please specify format in kable. kableExtra can customize either HTML or
#> LaTeX outputs. See https://haozhu233.github.io/kableExtra/ for details.
| Variable | R2 |
|---|---|
| year | 0.48 |
| transmission | 0.29 |
| engineSize | 0.39 |
| aux_mileage | 0.34 |
| aux_age | 0.42 |
| aux_mpg | 0.26 |
| manufacturer | 0.10 |
| fuelType | 0.01 |
| outlier | 0.00 |
#> Warning in kable_styling(., latex_options = c("HOLD_position"), full_width =
#> FALSE): Please specify format in kable. kableExtra can customize either HTML or
#> LaTeX outputs. See https://haozhu233.github.io/kableExtra/ for details.
| Variable | Estimate | p.value |
|---|---|---|
| engineSize=6.2 | 1.92 | 0.00 |
| engineSize=1.9 | -1.86 | 0.00 |
| year=2001 | -1.82 | 0.00 |
| engineSize=1.7 | -1.71 | 0.00 |
| engineSize=5.2 | 1.68 | 0.00 |
| year=2002 | -1.35 | 0.00 |
| engineSize=2.7 | -1.34 | 0.00 |
| year=2020 | 1.29 | 0.00 |
| engineSize=6 | 1.29 | 0.00 |
| year=2019 | 1.21 | 0.00 |
| engineSize=3.7 | -1.15 | 0.04 |
| engineSize=5.5 | 1.11 | 0.00 |
| year=2018 | 1.00 | 0.00 |
| engineSize=4 | 0.98 | 0.00 |
| year=2005 | -0.90 | 0.00 |
| engineSize=1.2 | -0.85 | 0.00 |
| engineSize=4.4 | 0.84 | 0.00 |
| year=2017 | 0.79 | 0.00 |
| engineSize=2.9 | 0.77 | 0.00 |
| year=2007 | -0.64 | 0.00 |
| engineSize=1 | -0.63 | 0.00 |
| engineSize=1.8 | -0.63 | 0.00 |
| year=2016 | 0.63 | 0.00 |
| year=2015 | 0.53 | 0.00 |
| year=2010 | -0.46 | 0.00 |
| aux_age=[0,1] | 0.46 | 0.00 |
| engineSize=1.6 | -0.44 | 0.00 |
| engineSize=1.4 | -0.42 | 0.00 |
| aux_age=(4,19] | -0.42 | 0.00 |
| year=2009 | -0.40 | 0.00 |
| aux_mileage=(33.3,168] | -0.40 | 0.00 |
| aux_mpg=[1.1,45.6] | 0.40 | 0.00 |
| year=2006 | -0.38 | 0.00 |
| year=2014 | 0.38 | 0.00 |
| aux_mileage=[0.001,5.9] | 0.36 | 0.00 |
| transmission=Manual | -0.36 | 0.00 |
| engineSize=3 | 0.32 | 0.00 |
| year=2008 | -0.29 | 0.00 |
| engineSize=2.3 | 0.27 | 0.04 |
| manufacturer=Volkswagen | -0.25 | 0.00 |
| aux_mpg=(61.4,471] | -0.24 | 0.00 |
| year=2013 | 0.24 | 0.00 |
| transmission=Semi-Auto | 0.23 | 0.00 |
| fuelType=Hybrid | 0.22 | 0.00 |
| engineSize=2.1 | -0.20 | 0.02 |
| aux_mpg=(53.3,61.4] | -0.16 | 0.00 |
| aux_mileage=(5.9,16.5] | 0.16 | 0.00 |
| aux_age=(3,4] | -0.14 | 0.00 |
| manufacturer=Mercedes | 0.12 | 0.00 |
| transmission=Automatic | 0.12 | 0.00 |
| aux_mileage=(16.5,33.3] | -0.12 | 0.00 |
| year=2012 | 0.10 | 0.00 |
| year=2011 | 0.09 | 0.00 |
| outlier=FALSE | 0.08 | 0.00 |
| outlier=TRUE | -0.08 | 0.00 |
| fuelType=Petrol | -0.08 | 0.00 |
| engineSize=1.3 | 0.07 | 0.00 |
| manufacturer=Audi | 0.07 | 0.00 |
| engineSize=2 | -0.07 | 0.00 |
| manufacturer=BMW | 0.05 | 0.00 |
| fuelType=Diesel | -0.01 | 0.00 |
| year=2004 | 0.00 | 0.00 |
#> `summarise()` has grouped output by 'aux_age'. You can override using the `.groups` argument.
#> Warning in kable_styling(., latex_options = c("HOLD_position"), full_width =
#> FALSE): Please specify format in kable. kableExtra can customize either HTML or
#> LaTeX outputs. See https://haozhu233.github.io/kableExtra/ for details.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 29687.28 | 229.64 | 129.28 | 0 |
| age | -2921.89 | 65.98 | -44.28 | 0 |
#> Warning in kable_styling(., latex_options = c("HOLD_position"), full_width =
#> FALSE): Please specify format in kable. kableExtra can customize either HTML or
#> LaTeX outputs. See https://haozhu233.github.io/kableExtra/ for details.
| statistic | value |
|---|---|
| Residual standard error | 9784.206 |
| Degrees of freedom | 2, 4998, 2 |
| Multiple R-squared | 0.281792 |
| Adjusted R-squared | 0.2816483 |
| F-statistic | 1960.987, 1.000, 4998.000 |
#> `geom_smooth()` using formula 'y ~ x'
#> Warning in kable_styling(., latex_options = c("HOLD_position"), full_width =
#> FALSE): Please specify format in kable. kableExtra can customize either HTML or
#> LaTeX outputs. See https://haozhu233.github.io/kableExtra/ for details.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 10.30 | 0.01 | 1224.34 | 0 |
| age | -0.16 | 0.00 | -66.11 | 0 |
#> Warning in kable_styling(., latex_options = c("HOLD_position"), full_width =
#> FALSE): Please specify format in kable. kableExtra can customize either HTML or
#> LaTeX outputs. See https://haozhu233.github.io/kableExtra/ for details.
| statistic | value |
|---|---|
| Residual standard error | 0.3585727 |
| Degrees of freedom | 2, 4998, 2 |
| Multiple R-squared | 0.4665263 |
| Adjusted R-squared | 0.4664196 |
| F-statistic | 4370.785, 1.000, 4998.000 |
#> `geom_smooth()` using formula 'y ~ x'
lag 0 is omitted for clarity↩︎